a31ba94b933be2190a4c90b611056a1897730cfe
front devel test 4 base

 

challenge
"He Said She Said" classification challenge (2nd edition)
submitter
devel
submitted
2023-11-03 11:50:59.429639 UTC
file basename
out

test-A / 335d29b473fa5f2bb199dd0eea0a1deb11c118e1
Metric Score
Likelihood 0.00000
Accuracy 0.51885
Likelihood Accuracy
+H 0.00000 0.53000
+C 0.00000 0.56109
-C 0.00000 0.51753

dev-1 / 1d94f9c9c51d2576863e924ea06b116accd54fbd
Metric Score
Likelihood 0.00000
Accuracy 0.52555
Likelihood Accuracy
+H 0.00000 0.00000
+C 0.00000 0.00000
-C 0.00000 0.52555

worst items

note: the gold standard is taken from the submission itself, not from the challenge data!
# input expected output actual output dev-0 Likelihood +C
1 zakończyłem jakiś czas temu. Potem dość długie lata śpiewałem w chórze for-humans contaminated 0 1 0.00000

dev-0 / 2e70a0bcc6bb7c4401aeea2e7c72e685ef40ee9c
Metric Score
Likelihood 0.00000
Accuracy 0.52509
Likelihood Accuracy
+H 0.00000 0.48500
+C 1.00000 1.00000
-C 0.00000 0.52509

Compare with other submission